feat(hpc): Fingerprint BindSpace API + VectorWidth config + WHT + BF16 tile GEMM + i2 quantization by AdaWorldAPI · Pull Request #109 · AdaWorldAPI/ndarray

AdaWorldAPI · 2026-04-18T21:48:14Z

Summary

Extends ndarray with the hardware + type primitives that lance-graph's
cognitive-shader-driver consumes. Everything in the contract crate depends
on these directly (Fingerprint, VectorWidth, SIMD lane views, BLAS-adjacent
kernels, quantization helpers).

`src/hpc/fingerprint.rs` (+236 lines)

Full BindSpace-compatible API on Fingerprint<N>:

Bit ops: get_bit, set_bit, toggle_bit
Algebra: bind (XOR), and, or, not, permute
Constructors: random(seed), orthogonal(seed), from_content(&str)
Stats: density, hamming (alias)
Bundling: bundle(items: &[&Self]) — majority vote
SIMD views: chunks_u64x8, chunks_u8x64 — zero-copy lane iteration
Width config: VectorWidth enum + LazyLock singleton + vector_config()
reading NDARRAY_VECTOR_WIDTH env var (production 16K default)

Six new types are now part of the public surface via simd re-exports.

`src/hpc/quantized.rs` (+48 lines)

quantize_f32_to_i2 / dequantize_i2_to_f32 — 2-bit precision for the
cascade path
dequantize_i8_to_f32 — paired reverse for the existing i8 codec
QuantParams public

`src/hpc/fft.rs` (+135 lines)

wht_f32(&mut [f32]) — Walsh–Hadamard Transform with F32x16 SIMD butterfly
wht_f32_new(&[f32]) — functional variant

Used by the cognitive shader's HAD-cascade codec.

`src/hpc/bf16_tile_gemm.rs` (+198 lines) + `src/hpc/amx_matmul.rs` (+44 lines)

bf16_tile_gemm — AMX TDPBF16PS primitive with AVX-512 polyfill
16×16 tile matrix multiply for BF16 × BF16 → f32 accumulation
Runtime dispatch through simd_caps()

`src/simd.rs` (+36 lines)

Public re-exports for lance-graph consumers:

pub use crate::hpc::fingerprint::{
    Fingerprint, Fingerprint2K, Fingerprint1K, Fingerprint64K,
    VectorWidth, VectorConfig, vector_config,
};
pub use crate::hpc::bnn_cross_plane::CollapseGate;
pub use crate::hpc::bitwise::{hamming_distance_raw, popcount_raw};
pub use crate::hpc::fft::{wht_f32, wht_f32_new};
pub use crate::hpc::quantized::{
    quantize_f32_to_i4, dequantize_i4_to_f32,
    quantize_f32_to_i2, dequantize_i2_to_f32,
    quantize_f32_to_i8, dequantize_i8_to_f32, QuantParams,
};
pub use crate::hpc::cam_pq::{kmeans, squared_l2};
pub use crate::hpc::heel_f64x8::cosine_f32_to_f64_simd;

Consumers write use ndarray::simd::{Fingerprint, VectorWidth, ...};
and never touch internal hpc::* paths.

`.claude/knowledge/cognitive-shader-foundation.md` (+137 lines)

Agent knowledge doc parallel to lance-graph's. Explains the SIMD floor
(F32x16), the 4-tier dispatch (F32x16 → VNNI2 → AVX512-VNNI → AMX),
the Fingerprint const-generic model, the VectorWidth LazyLock
config path, and which public types lance-graph consumes.

`.claude/agents/*.md` model bumps

Four agents (l3-strategist, migration-tracker, product-engineer,
vector-synthesis) updated to the Opus 4.7 model tag.

Test plan

cargo test --lib fingerprint — 21 passing
cargo check — clean
Existing ndarray tests unaffected (1639 filtered-out tests in the
pattern-matched run are from other modules, all passing in full cargo test)
Downstream consumer verified: cargo test -p lance-graph-contract
and cargo test -p cognitive-shader-driver compile and pass with
these additions (tested during lance-graph PR #206)

Downstream impact

lance-graph's cognitive-shader-driver and lance-graph-contract both
import from ndarray::simd::* — this PR is what lets their PR #206
compile. Merging unblocks the Tier 0 quick wins (Q2 Cargo.toml pin,
AriGraph wiring, cockpit endpoints).

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

QW4: chunks_u64x8() — iterate as 8-word batches for VPOPCNTDQ chunks_u8x64() — iterate as 64-byte batches for U8x64 ops bundle() — majority vote across multiple fingerprints These enable the Layer 1 cascade: sweep a fingerprint column via SIMD-width chunks, then bundle consensus across agents. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

Adds missing BindSpace API methods: orthogonal(seed) — golden-ratio-seeded quasi-orthogonal fingerprint or() — bitwise OR bundle(&[&Self]) — majority vote across multiple fingerprints chunks_u64x8() — iterate as 8-word batches for AVX-512 VPOPCNTDQ chunks_u8x64() — iterate as 64-byte batches for U8x64 ops https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3c77050c7f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-18T21:51:08Z

+    pub fn chunks_u64x8(&self) -> impl Iterator<Item = &[u64]> {
+        self.words.chunks(8)


Return only full SIMD lanes in chunk iterators

chunks_u64x8 uses .chunks(8), so Fingerprint<N> values where N % 8 != 0 produce a final slice shorter than 8 words; callers following the documented U64x8::from_slice(chunk) usage will panic on that tail chunk. Because Fingerprint is a public const-generic type (and this file already uses small non-multiple test sizes), this API can crash on valid inputs unless it uses chunks_exact(8) (or otherwise handles the remainder explicitly).

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-18T21:51:09Z

+    pub fn orthogonal(seed: u64) -> Self {
+        Self::random(seed.wrapping_mul(0x9E3779B97F4A7C15))


Prevent orthogonal(0) from collapsing to zero fingerprint

orthogonal forwards the transformed seed into random; when seed == 0, this still passes 0, and the xorshift state remains all-zero, yielding an all-zero fingerprint every time. That breaks the method’s stated quasi-orthogonal behavior and creates a degenerate vector if callers generate seeds starting at 0 (a common indexing pattern), so zero should be remapped/mixed to a non-zero RNG state.

Useful? React with 👍 / 👎.

claude added 2 commits April 18, 2026 15:16

AdaWorldAPI merged commit 4784945 into master Apr 18, 2026
5 of 14 checks passed

chatgpt-codex-connector Bot reviewed Apr 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(hpc): Fingerprint BindSpace API + VectorWidth config + WHT + BF16 tile GEMM + i2 quantization#109

feat(hpc): Fingerprint BindSpace API + VectorWidth config + WHT + BF16 tile GEMM + i2 quantization#109
AdaWorldAPI merged 2 commits into
masterfrom
claude/teleport-session-setup-wMZfb

AdaWorldAPI commented Apr 18, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 18, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		pub fn chunks_u64x8(&self) -> impl Iterator<Item = &[u64]> {
		self.words.chunks(8)

		pub fn orthogonal(seed: u64) -> Self {
		Self::random(seed.wrapping_mul(0x9E3779B97F4A7C15))

Conversation

AdaWorldAPI commented Apr 18, 2026

Summary

src/hpc/fingerprint.rs (+236 lines)

src/hpc/quantized.rs (+48 lines)

src/hpc/fft.rs (+135 lines)

src/hpc/bf16_tile_gemm.rs (+198 lines) + src/hpc/amx_matmul.rs (+44 lines)

src/simd.rs (+36 lines)

.claude/knowledge/cognitive-shader-foundation.md (+137 lines)

.claude/agents/*.md model bumps

Test plan

Downstream impact

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`src/hpc/fingerprint.rs` (+236 lines)

`src/hpc/quantized.rs` (+48 lines)

`src/hpc/fft.rs` (+135 lines)

`src/hpc/bf16_tile_gemm.rs` (+198 lines) + `src/hpc/amx_matmul.rs` (+44 lines)

`src/simd.rs` (+36 lines)

`.claude/knowledge/cognitive-shader-foundation.md` (+137 lines)

`.claude/agents/*.md` model bumps